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ABSTRACT 

Using the period and mass data of two hundred and seventy-nine extraso- 
lar planets, we have constructed a coupled period-mass function through the 
non-parametric approach. This analytic expression of the coupled period-mass 
function has been obtained for the first time in this field. Moreover, due to 
a moderate period-mass correlation, the shapes of mass/period functions vary 
as a function of period/mass. These results of mass and period functions give 
way to two important implications: (1) the deficit of massive close-in planets is 
confirmed, and (2) the more massive planets have larger ranges of possible semi- 
major axes. These interesting statistical results will provide important clues into 
the theories of planetary formation. 
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1. Introduction and Motivation 

After the first detection of an extra-solar planet (exoplanet) around a millisecond pulsar 
in 1992 (Wolszczan & Frail 1992), it was soon reported that another exoplanet, the first one 
around a sun- like star, i.e. 51 Pegasi b, was found (Mayor & Queloz 1995). Ever since then, 
there has been a continuous flood of discoveries of extra-solar planets. As of February 2008, 
more than 200 planets have been detected around solar type stars. These discoveries have 
led to a new era in the study of planetary systems. For example, the traditional theory for 
the formation of the Solar System does not likely explain certain structures of extra-solar 
planetary systems. This is due to the properties, discovered in extra-solar planetary systems, 
being quite unlike our own. Many detailed simulations and mechanisms have been proposed 
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to explore these important issues (Jiang & Ip 2001, Kinoshita & Nakai 2001, Armitage et al. 
2002, Ji et al. 2003, Jiang & Yeh 2004a, Jiang & Yeh 2004b, Boss 2005, Jiang & Yeh 2007, 
Rice et al. 2008). 

As the number of detected exoplanets keeps increasing, the statistical properties of ex- 
oplanets have become more meaningful. For example, assuming that the mass and period 
distributions are two independent power-law functions, Tabachnik & Tremaine (2002) used 
the maximum likelihood method to determine the best power-index. However, the possi- 
bility of a mass-period correlation is not addressed in their work. Zucker & Mazeh (2002) 
determined the correlation coefficient between mass and period in logarithmic space and 
concluded that the mass-period correlation is significant. 

On the other hand, a clustering analysis of the data we have on exoplanets also gives 
some interesting results. Jiang et al. (2006) took a first step into clustering analysis and 
found that the mass distribution is continuous, and the orbital population could be classified 
into three clusters which correspond to the exoplanets in the regimes of tidal, ongoing tidal 
and disc interaction. Marchi (2007) also worked on clustering through different methods. 

To take things a step further from the mass-period distribution function of Tabachnik 
& Tremaine (2002) and the mass-period correlation of Zucker & Mazeh (2002), Jiang, Yeh, 
Chang, & Hung (2007) (hereafter JYCH07) employed an algorithm to construct a coupled 
mass-period function numerically. They were able to include the possible correlation of 

mass and period into the distribution function for the first time in this field and obtained a 
distribution function that found a correlation to be consistent. In fact, the mass-period dis- 
tribution obtained by JYCH07 should be called the mass-period probaMlity density function 
(pdf) in statistics. The integral of pdf is then called the cumulative distribution function 
(cdf). We will use the above terms in this paper. 

Although JYCH07 successfully constructed the coupled mass-period pdf numerically, 
due to constraints in the algorithm they employed, they were forced to use the parametric 
approach of /^-distribution on the pdf fitting. The pdf is a basic characteristic describing 
the behavior of random variables, i.e. mass and period, and is so important that one has 
to choose the underlying functional form carefully. One possibility to address this problem 
is to use the nonparametric approach. This is because the nonparametric approach is a 
distribution-free inference. That is, an inference that is made without any assumptions 
regarding the functional form of the underlying distribution. In addition, the most valuable 
indication of the nonparametric approach is to let the data speak for itself. We therefore see 
no other reasonable course of action than to use the nonparametric approach in this paper. 

Moreover, we still consider the period-mass coupling even while the pdf and cdf are 
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being constructed. In order to make it possible to proceed, we will employ a method called 
"Copula Modelling" to obtain the coupled pdf and cdf on the period and mass of exoplanets. 
This method is more general than the one used in JYCH07 so that a nonparametric approach 
can be used to obtain the coupled pdf. "Copula Modelling" has a long history of development 
and was too complicated to be used with real data, in practical terms, until Trivedi & Zimmer 
(2005) clearly demonstrated a standard modelling procedure. 

In §2, we briefly describe the data and in §3, an estimation of the nonparametric ap- 
proach will be done. In §4, we introduce the method of Copula Modelling and demonstrate 
its credibility. The Copula Modelling will then be directly applied on the data of exoplanets. 
The results will be described and discussed in 55. Our main conclusions will be found in 56. 



2. The Data 

We took samples of exoplanets from The Extrasolar Planets Encyclopaedia fhttp://' 
exoplanet.eu/catalog-all.php), 2008 April 10. Our samples do not include OGLE235-MOA53b, 
2M1207b, GQ Lupb, AB Pic b, SCR 1845b, UScoCTIO108b, or SWEEPS-04 because either 
their mass or their period data was not listed. The outlier, PSR B1620-26b, with a huge 
period (100 years), is also excluded. 

The data of orbital periods is taken directly from the table in The Extrasolar Planets 
Encyclopaedia. As a result, only the values of projected mass (m sini) are listed and only 
a small fraction of exoplanets' inclination angles i are known so we decided to provide two 
models of planetary mass in this paper. For the "minimum-mass model", we simply set 
sim = 1 for all planetary systems in the data. For the "guess-mass model", an inclination 
angle i within the observational constraint is assigned to a planetary system through a 
random process and the mass is then determined accordingly. In this case, if the inclination 
angle i is given in The Extrasolar Planets Encyclopaedia for a particular planet, we simply 
use its value. If there is no mention of observational constraints, the angle i will be randomly 
chosen between 0° and 90°. Please note that the unit of period is days, and the unit of mass 
is Jupiter Mass (Mj). 

3. The Nonparametric Approach 

Considering n data points of extrasolar planets in period and mass spaces, i.e. (pi, mi), 
{p2, ^2), ■ ■ ■ , {pn, ^n)] Fp{p) and FM{m), are the cdfs of period and mass, and fp{p) and 
fuij^) are the pdfs of period and mass, respectively. An estimate of the cdf, Fp{p), at the 
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point p is the proportion of samples that are less than or equal to p 



1, if Pi < P, 
0, if Pi > p. 



^p(p) = ;^EAp.<p) (1) 

where /(•) is the indicator function defined by 

Similarly, the nonparametric estimate of the cdf, FM{m), at the point m is 

1 " 

^M(m) = ^— V/(mi<m). (2) 

n + I 

The solid curves in Figure l(a)-(b) are -Fp(p) and the minimum-mass model's FM{m). The 
dotted curve in Figure 1(b) is the guess-mass model's FM{m). 

To obtain the analytic expressions for the pdfs fp{p) and fM{m), we first plot the 
histograms in p and m spaces, as shown in Figure 1(c)- (d). In these two histograms, we 
choose the bandwidths hp and Hm of p and m as follows (Silverman 1986, page 47): 

hp = 0.9ylpn~^/^ Hm = 0.9^Mn"^/^ 

where 

, . iQRp\ , . iQRm\ 

Ap — mm < bp. > , Am — mm <^ b u ■ r • 

I 1.34 i' I 1.34 J 

Sp (Sm) and IQRp (IQRm) arc the standard deviation and interquartile range of pi, ■ ■ ■ , p„ 

(mi, • • • m„), respectively. Here the interquartile range is the difference between the first and 

third quartiles (also see this definition in §5.1). In our data, Sp — 896.464, IQRp — 846.360, 

and Sm = 3.499 {Sm = 5.235), IQRm = 2.520 {IQRm = 3.694) for the minimum-mass 

model (guess-mass model). 

We then use the adaptive kernel method (Silverman 1986, page 101) to estimate the 
pdfs fp{p) and /m('^) as follows: 



Step 1 Finding the pilot estimates 
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nhM ^ riM ^ 

where K{-) is the Gaussian kernel, i.e., 



K(x) = ^e-^'/2_ 



Step 2 Defining local bandwidth factors Af and Af^ by 

\ gp y \ qm y 

where gp {qm) is the geometric mean of fp{pi) {fui'^i))-! that is. 

In grp = - V In fp{pi), In g^M = - 52 fM{mi). 



n — ' n 

1=1 1=1 



Step 3 Obtaining the adaptive kernel estimate fp{p) (/m(^)) of the pdf fp{p) (/mI'^)) by 



j=l --J - P 3 P 



1 " 1 



m — TTij 
X¥h, 



Thus, the analytic expressions are obtained. In order to compare these with the his- 
tograms, we define fp{p) = Areap x fp{p), where Areap = 51240.68 is the area under the 
histogram of the period in Figure 1(c). Similarly, we set f^{m) = AreaMm x fMim), the 
minimum-mass model, where AreaMm — 153.06, and set f^{m) = AreaMg x fiwi'm), the 
guess-mass model, where AreaMg — 217.20. fp{p) is plotted as the sohd curve in Figure 
1(c), fM^im) is the solid curve in Figure 1(d), and /j^(m) is the dotted curve in Figure 1(d). 



4. The Copula Modelling Method 

In this section, we will describe the procedure to construct a new period-mass pdf, in 
which the possible period and mass correlation is included. The Copula ModeUing method, 
which is widely used to construct multi-variate distributions (Genest & MacKay 1986, Frees 
& Valdcz 1998, Klugman & Parsa 1999 and Venter ct al.2007), will be introduced in the first 
part of this section and its credibility will be demonstrated in the second part. 
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4.1. The Procedure and Equations 

Here we describe the Copula Modelling method in a way that readers can reproduce 
the results or work on their own applications with the equations provided in this paper. 
However, please refer to Trivedi & Zimmer (2005) for further details of Copula Modelling. 

According to Trivedi & Zimmer (2005), there are several "copula functions" to be used 
but the Frank copula is more flexible as it allows two variables of the data to have negative, 
zero, and positive correlations (Prank 1979). This is suitable to our work, as we hope that 
all different kinds of possible period-mass correlations can be considered in the construction 
of coupled pdf. The Prank copula function is given by 
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(g-e«i _ i)(e-^"2 - 1) 



C(«i, «2; ^) = In [1 + ^ , (3) 

where ui, U2 (0 < wi, U2 < 1) are two marginal distribution functions and 9 (— oo < 9 < oo) 
is the dependence parameter. Positive, zero and negative values of 9 correspond to the 
positive dependence, independence and negative dependence between two marginal variables, 
respectively. 

Por our work here, ui is the cdf of period, Fp{p), and U2 is the cdf of mass, FM{m). In 
Copula Modelling, the pdf of the coupled period-mass distribution is 

f(p,M){P'^\^) = dFpdFM fpiP)f^i^)^ 

- ^ ^fpip)fM{m). (4) 

e-0 -1 + [e-^Fpip) - i)(e-^^M(m) _ 

We now have an analytic form of the coupled period-mass pdf where the parameter 9 
is to be determined through the Maximum Likelihood Method. 

The log- likelihood function of 9 for the samples (pi, m^), i = 1, n can be written as 

m-^i+Ho), (5) 



where 



n 

h = ^ [ln/p(p,) + ln/M(m,)] (6) 

i=l 

n 

l2{9) = nhi[-9{e-' -l)]-Y,{^[Fp{Pi) + FM{mi)] 

+21n[e-^ - 1 + (e-^^'^(P») - l)(e-^^^('"') - 1)]}. (7) 
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Differentiating £{9) with respect to 9, we obtain 

de{9) _ di2{e) _ ^ ce-^ -i-ee-~ 



de de ^ I e{e-« - 1) 



i=l 



e"^ - 1 + (e-^^M(mi) _ i)(e-0Fp(Pi) _ 



After the estimates of cdfs Fp{p) and FM{m) have been substituted into d£{9)/d9, the 
estimate of 9 is obtained by solving 

de{9) 



d9 



= 0. 



Moreover, according to Genets (1987), the parameter 9 in Copula Modelling is related 
to the Spearman rank-order correlation coefficient {ps) through the below formula: 

Ps^PG^il- Be-'l^ - e-^)(e-^/^ - 1)-^. (8) 
We call this po the Genets correlation coefficient in this paper. 



4.2. The Credibility Test 

Since this is the first time that Copula Modelling has been introduced and employed in 
astronomy, we shall demonstrate its credibility. We will generate four sets of two hundred 
and seventy- nine artificial data points of uniform random variables x and with different 
strength of x-y correlations as presented in Figure 2(a)-(d). The Spearman correlation 
coefficients ps (also see §5.2 for the definition) between x and y are in Table 1. We apply the 
Copula Modelling on these four sets of experimental data, where the nonparametric approach 
is used to obtain the cdfs of x and y. Finally, the coupled x-y pdf, the coupling parameter 
^, and pg are obtained. We also calculate the Bootstrap Confidence Interval (C.I.) for 9 and 
Pg with the number of bootstrap rephcations B=2000 (JYCH07). These results are all fisted 
in Table 1. 



Table 1 



Data Set 


Ps 


9 


95% C.I. for 9 


Pg 


95% C.I. for Pg 


(1) 


0.012 


0.25 


(-0. 155. 0.955) 


0.012 


(-0.07(x0.158) 


(2) 


0.231 


1.42 


(0.685,2.140) 


0.233 


(0.114,0.343) 


(3) 


0.416 


2.725 


(1.93,3.535) 


0.427 


(0.312,0.534) 


(4) 


0.788 


7.57 


(6.355,8.845) 


0.866 


(0.798,0.915) 



-8- 



Bccausc the values of Spearman correlation coefficient ps are close to pc and within 
Pg's 95% confidence intervals, we confirm that Copula Modelling gives the correct coupling 
parameter 9 and the Genets correlation coefficients pc- Thus, the coupling between x and y 
can be correctly included when the pdf is constructed for any given strength of correlation. 

5. Results 

In this section, the results of the coupled period-mass distribution and the correlation 
coefficients will be presented. 

5.1. The Coupled Period-Mass Distribution 

Using the Copula ModeUing, the estimate of ^ is ^ = 2.3826 for the minimum-mass 
model. Through the bootstrap algorithm as described in JYCH07 with the number of 
bootstrap replications B — 2000, the standard error of 9 is 0.3669. In order to properly 
understand the dependence parameter 9, we also obtain the 95% bootstrap C.I. for ^, which 
is (1.6514,3.1190). For the guess-mass model, the estimate of 6* is ^ = 2.4565 and its 95% 
bootstrap C.I. is (1.7282,3.1633). 

Furthermore, in order to check the stability of the guess-mass model, we repeat the 
random process to generate 100 guess-mass models and apply Copula Modelling on them. 
The average value of 9 is 2.9249 with the standard deviation 0.3349. We then employ the 
interquartile range (Turky 1977) to check for any outliers of 9 from these 100 guess-mass 
models. The interquartile range is the difference between the first quartile Qi and the third 
quartile Q^, i.e. IQR — Qz — Qi- Inner fences are the left and right from the median at 
a distance of 1.5 times the IQR. Outer fences are at a distance of 3 times the IQR. The 
values lying between the inner and outer fences are called suspected outliers and those lying 
beyond the outer fences are called outhers (Hogg & Tanis 2006). 

The smallest, first quartile, median, third quartile and largest of these 100 9 values, 
denoted by Min, Qi, Me, Q^, Max, respectively, are 

Min = 2.3730, Qi = 2.6297, Me = 2.8833, Q3 = 3.1968, Max = 3.5776. 

Therefore, IQR — 0.5671 and cutoffs for outliers are 

g3 + 1.5/gi? = 4.0475, g3 + 3/gi? = 4.8981, gi-1.5/gi?= 1.7791, gi-3/gi?= 0.9284. 
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Furthermore, we find that 

Qi - 1.5IQR{^ 1.7791) < Min(= 2.3730) < Max{^ 3.5776) < Qs + 1.5IQR{^ 4.0475). 

Thus, all 100 6 values of the guess-mass model lie within the inner fences. It means that no 
outliers exist in these 100 values and so the stability of the guess-mass model is confirmed. 

Figure 3 shows the three dimensional view of the coupled period-mass pdf, /(p,m)(p, "^|^), 
of the guess-mass model. The contour of Figure 3 is presented in Figure 4. The plots of the 
minimum-mass model's f{p,M){p,'m'\d) are very similar to the above, so we have not shown 
them. 

We know that when the period and mass are completely independent, /(p,m)(p, "^|^) = 
/p(p)/m(?7t.). Thus the term 

_5)(e-^ - i)e-^-^^'(?')e~^^'^("') 
{e-^ - 1 + (e-^^^(f) - l)(e-«^M(m) _ i)}2 

in Eq. (4) is the one to take the period-mass coupling into account. We will hereafter call 
it the Coupling Factor. To make it clear how the Coupling Factor behaves, its value as a 
function of p and m of the guess-mass model is plotted in Figure 5. Figure 6 is the color 
contour plot. It clearly shows that the Coupling Factor becomes larger than one when both 
period and mass are very small or when both of them are large (area of red). It also shows 
that the Couphng Factor is less than one in the blue area. 



5.2. The Correlation Coefficients 

JYCH07 calculated the linear correlation coefficients (also called Pearson's correlation 
coefficients) in both m-p and Inm-lnp spaces and found a weak correlation in m-p and a 
moderate correlation in Inm-lnp space. In order to maintain a consistent determination on 
the correlation coefficients, we now calculate the Spearman rank-order correlation coefficients 
(Press et al. 1992), which are invariant under strictly increasing nonlinear transformations 
(Schweizer and Sklar 2005). 

For pairs of quantities {xi,yi),i — 1, . . . ,n, the linear correlation coefficient is given by 



(9) 



where x = X]r=i^«/^' V ~ Sr=i^«/^- "^^^ Spearman rank-order correlation coefficient is 
calculated by the above formula with Xi and yi replaced by their ranks. Given R{xi) the 
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rank of Xi and R{yi) the rank of i/i, then the Spearman rank-order correlation coefficient can 
be written as 



We note that \ps\ — 1 indicates a perfect dependence and ps — means no dependence. 
When Ps — 1 there is a direct perfect dependence and when ps — —1 there is an inverse 
perfect dependence. Furthermore, according to Cohen (1988), 0.1 < \ps\ < 0.3 means the 
correlation is weak, 0.3 < \ps\ < 0.5 indicates a moderate correlation, and 0.5 < \ps\ < 1.0 
is indicative of strong correlation. 

For the minimum-mass model, the Spearman rank-order correlation coefficient is ob- 
tained as Ps = 0.3769. Through Copula Modelling, wc also find the estimate of pc, which is 
Po = 0.3792. It is obvious that the Spearman rank-order correlation coefficient ps = 0.3769 
is very close to pc. Moreover, the 95% bootstrap C.l. with the number of bootstrap replica- 
tions B = 2000 for pg is (0.2691,0.4811). For the guess-mass model, we have pc = 0.3899 
with a 95% bootstrap C.I. (0.2811, 0.4869). These results are all consistent and confirm that 
there is a positive period-mass correlation for exoplanets. 



Using the data of exoplanets, for the first time in this field we have constructed an 
analytic coupled period-mass function through a nonparametric approach. Moreover, we 
calculate the Spearman rank-order correlation coefficient, which gives the same results for 
linear and logarithmic spaces, and the results in the previous section show that there is a 
moderate positive period-mass correlation. 

In order to comprehend the implication of our results, in Figure 7(a)-(b), wc plot 
f{p,M){P)^\(^) with m = 1,5, 10, 15Mj (i.e. the period functions given different masses), 
and also f(p,M){p,^\()) with p = 1,50,100,150 days (i.e. the mass functions given differ- 
ent periods) in logarithmic spaces. For purposes of comparing, /p(p) x /m(?7i) with m — 
1, 5, 10, 15Mj (the independent period functions) and fp{p) x fMim) with p — 1, 50, 100, 150 
days (the independent mass functions) are also plotted in Figure 7(c)-(d). Of course, the 
shapes of independent period functions with m = 1, 5, 10, 15Mj are all the same, and the 
shapes of independent mass functions given different periods are all exactly the same as well. 

We find that the period function of m = IMj is very similar with the independent 
period functions. However, the period functions of m = 5, 10, 15Mj are different from the 
independent ones, in a way that the functions are lower at the smaller p end and slightly 




(10) 



i=l 



6. Conclusions 
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higher at the larger p end. Thus, the overall period functions of massive planets (say m = 
5, 10, 15Mj) at large p and small p ends are closer than the one of lighter planets (say 
m = IMj). Therefore, the fractions of larger and smaller p (or semi-major- axis) planets are 
closer for those planets with mass m = 5, 10, 15Mj. 

This implies that the more massive planets have larger ranges of possible semi-major 
axes. This interesting statistical result will provide important clues into the theories of 
planetary formation. 

On the other hand, the mass functions of p = 50, 100, 150 days are all very similar with 
the independent mass functions. However, the mass function of p = 1 day is different from 
the independent one in a way that the function is higher at the smaller m end and lower 
at the larger m end. Thus, the mass function of short period planets (say p = 1 day) is 
steeper than the one of long period planets (say p = 50, 100, 150 days). This implies that the 
percentage of massive planets are relatively small for the short period planets. This result 
reconfirms the deficit of massive close-in planets due to tidal interaction as studied in Jiang 
et al. (2003). 
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80 




p(Days) m{Mj) 

Fig. 1. — The cumulative distribution function (cdf) and probability density function (pdf) 
of planetary period and mass, (a) The period cdf. (b) The mass cdf of the minimum-mass 
model (solid curve) and the guess- mass model (dotted curve), (c) The histogram of planets 
in p space and also the period pdf fp{p) (solid curve), (d) The histogram of planets in m 
space of the minimum-mass model (solid line) and the guess-mass model (dotted line), and 
also the mass pdf of the minimum-mass model f^{m) (solid curve) and the guess- mass 
model fM{^) (dotted curve). 
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Fig. 2. — The random variables x and y in the credibihty test of Copula ModeUing. (a) ps 
0.042. (b) p5=0.231. (c) p5= 0.416 (d) ps^ 0.788. 
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Fig. 3. — The three dimensional view of the coupled period-mass pdf, f(^p^M){p,^\0), of the 
guess-mass model. 
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Fig. 5. — The three dimensional view of the Couphng Factor of the guess-mass model. 
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Fig. 6. — The color contour of the Couphng Factor of the guess-mass model 
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Fig. 7. — The period and mass functions in logarithmic space, (a) The period functions of 
m = IMj (sohd curve), m = 5Mj (dotted curve), m = lOMj (short dashed curve), and 
m = 15Mj (long dashed curve), (b) The mass functions of p = 1 day (solid curve), p = 50 
days (dotted curve), p — 100 days (short dashed curve), and p — 150 days (long dashed 
curve), (c) The independent period functions of m = IMj (sohd curve), m — 5Mj (dotted 
curve), m — lOMj (short dashed curve), and m — 15Mj (long dashed curve), (d) The 
independent mass functions of p = 1 day (solid curve), p = 50 days (dotted curve), p — 100 
days (short dashed curve), and p = 150 days (long dashed curve). 



